Non-redundant Subgroup Discovery Using a Closure System

نویسندگان

  • Mario Boley
  • Henrik Grosskreutz
چکیده

Subgroup discovery is a local pattern discovery task, in which descriptions of subpopulations of a database are evaluated against some quality function. As standard quality functions are functions of the described subpopulation, we propose to search for equivalence classes of descriptions with respect to their extension in the database rather than individual descriptions. These equivalence classes have unique maximal representatives forming a closure system. We show that minimum cardinality representatives of each equivalence class can be found during the enumeration process of that closure system without additional cost, while finding a minimum representative of a single equivalence class is NP-hard. With several real-world datasets we demonstrate that search space and output are significantly reduced by considering equivalence classes instead of individual descriptions and that the minimum representatives constitute a family of subgroup descriptions that is of same or better expressive power than those generated by traditional methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering statistically non-redundant subgroups

The objective of subgroup discovery is to find groups of individuals who are statistically different from others in a large data set. Most existing measures of the quality of subgroups are intuitive and do not precisely capture statistical differences of a group with the other, and their discovered results contain many redundant subgroups. Odds ratio is a statistically sound measure to quantify...

متن کامل

Non-redundant Subgroup Discovery in Large and Complex Data

Large and complex data is challenging for most existing discovery algorithms, for several reasons. First of all, such data leads to enormous hypothesis spaces, making exhaustive search infeasible. Second, many variants of essentially the same pattern exist, due to (numeric) attributes of high cardinality, correlated attributes, and so on. This causes top-k mining algorithms to return highly red...

متن کامل

Groups in which every subgroup has finite index in its Frattini closure

‎In 1970‎, ‎Menegazzo [Gruppi nei quali ogni sottogruppo e intersezione di sottogruppi massimali‎, ‎ Atti Accad‎. ‎Naz‎. ‎Lincei Rend‎. ‎Cl‎. ‎Sci‎. ‎Fis‎. ‎Mat‎. ‎Natur. 48 (1970)‎, ‎559--562.] gave a complete description of the structure of soluble $IM$-groups‎, ‎i.e.‎, ‎groups in which every subgroup can be obtained as intersection of maximal subgroups‎. ‎A group $G$ is said to have the $FM$...

متن کامل

Comparative analysis of profit between three dissimilar repairable redundant systems using supporting external device for operation

The importance in promoting, sustaining industries, manufacturing systems and economy through reliability measurement has become an area of interest. The profit of a system may be enhanced using highly reliable structural design of the system or subsystem of higher reliability. On improving the reliability and availability of a system, the production and associated profit will also increase. Re...

متن کامل

MTBF evaluation for 2-out-of-3 redundant repairable systems with common cause and cascade failures considering fuzzy rates for failures and repair: a case study of a centrifugal water pumping system

In many cases, redundant systems are beset by both independent and dependent failures. Ignoring dependent variables in MTBF evaluation of redundant systems hastens the occurrence of failure, causing it to take place before the expected time, hence decreasing safety and creating irreversible damages. Common cause failure (CCF) and cascading failure are two varieties of dependent failures, both l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009